Data Structures for Maintaining Set Partitions

نویسندگان

  • Michael A. Bender
  • Saurabh Sethia
  • Steven Skiena
چکیده

Each test or feature in a classification system defines a set partition on a class of objects. Adding new features refines the classification, whereas deleting features may result in merging previously distinguished classes. As an illustration, consider the set of automobile types { VW Beetle, Toyota, Lexus, Cadillac }. The feature size partitions the cars into sets of small and large cars, {{ VW Beetle, Toyota}, { Lexus, Cadillac }}. The feature domestic-origin partitions the cars into {{ VW Beetle, Toyota, Lexus }, { Cadillac }}. The feature ugly-shape distinguishes { VW Beetle, Cadillac } from { Toyota, Lexus }. Incorporating both size and origin induces the refined partition {{ VW Beetle, Toyota}, { Lexus }, { Cadillac }}, whereas the union of all three features completely distinguishes the types of cars. In fact, size and ugly-shape are sufficient for complete identification, so domestic-origin could be deleted from the set of features without affecting the induced partition. Efficiently maintaining the partition induced by a set of features is an important problem in building decision tree classifiers. For example, in building an optical character recognition (OCR) system [15,16] based on point-probe decision trees [1], each of the 1500-plus pixels in each character-sized window of the image may be evaluated as a possible feature. An important goal is to find a small, robust set of probe points sufficient to distinguish among the 70-plus characters in a font, a process that may require repeatedly inserting and deleting features to see the impact on the final classification. In this paper, we introduce techniques to speed up this process of feature identification. We propose a series of data structures for maintaining a collection of set partitions on elements U = {1, . . . , n}. The data structures efficiently support the following three operations:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stirling number of the fourth kind and lucky partitions of a finite set

The concept of Lucky k-polynomials and in particular Lucky χ-polynomials was recently introduced. This paper introduces Stirling number of the fourth kind and Lucky partitions of a finite set in order to determine either the Lucky k- or Lucky χ-polynomial of a graph. The integer partitions influence Stirling partitions of the second kind.

متن کامل

k-Efficient partitions of graphs

A set $S = {u_1,u_2, ldots, u_t}$ of vertices of $G$ is an efficientdominating set if every vertex of $G$ is dominated exactly once by thevertices of $S$. Letting $U_i$ denote the set of vertices dominated by $u_i$%, we note that ${U_1, U_2, ldots U_t}$ is a partition of the vertex setof $G$ and that each $U_i$ contains the vertex $u_i$ and all the vertices atdistance~1 from it in $G$. In this ...

متن کامل

A Multidimensional Data Structure for Maintaining XML Data Partitions

To achieve good performance of processing queries on huge XML data in cluster machines, data partitioning and placement strategy is one of the key factors. In this paper we propose a multidimensional data structure for maintaining XML data partitions, specifically for holistic twig join processing. Initially, we construct the multidimensional data structure from statistical information on vario...

متن کامل

Post-Processing Partitions to Identify Domains of Modularity Optimization

We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics. Given a set of partitions, CHAMP identifies the domain of modularity optimization for each partition-i.e., the parameter-space domain where it has the largest modularity rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Random Struct. Algorithms

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2000